Introduction to¶

title

with Application to Bioinformatics¶

- Day 1¶

Who we are¶

Nina Dimitris Malin Jeanette Ingrid
Drawing Drawing Drawing Drawing Drawing
Kostas Henrike Ashfaq Claudio Sebastian
Drawing Drawing Drawing Drawing Drawing

Who you are¶

Drawing

Practical issues¶

  • Course website: https://uppsala.instructure.com/courses/71521
  • Course lectures streamed from Uppsala to Umeå
  • TAs on each site
  • Short lectures with many breaks
  • Schedule times are approximate

Schedule¶

Drawing

To start with¶

  • Write a short presentation of yourself in the HackMD

Check¶

  • Has everyone managed to install Python?
  • Have you managed to run the test script?
  • Have you installed notebooks? (optional)

What is programming?¶

Wikipedia:

"Computer programming is the process of building and designing an executable computer program for accomplishing a specific computing task"

What can we use it for?¶

Endless possibilities!

  • reverse complement DNA
  • custom filtering of VCF files
  • plotting of results
  • all excel stuff!

Why Python?¶

Typical workflow¶

  1. Get data
  2. Clean, transform data in spreadsheet
  3. Copy-paste, copy-paste, copy-paste
  4. Run analysis & export results
  5. Realise the columns were not sorted correctly
  6. Go back to step 2, Repeat

Drawing

Python versions¶

Old versions Python 3
Python 1.0 - January 1994 Python 3.0 - December 3, 2008
Python 1.0 - January 1994 Python 3.1 - June 27, 2009
Python 1.2 - April 10, 1995 Python 3.2 - February 20, 2011
Python 1.3 - October 12, 1995 Python 3.3 - September 29, 2012
Python 1.4 - October 25, 1996 Python 3.4 - March 16, 2014
Python 1.5 - December 31, 1997 Python 3.5 - September 13, 2015
Python 1.6 - September 5, 2000 Python 3.6 - December 23, 2016
Python 2.0 - October 16, 2000 Python 3.7 - June 27, 2018
Python 2.1 - April 17, 2001 Python 3.8 - October 14, 2019
Python 2.2 - December 21, 2001 Python 3.9 - October 5, 2020
Python 2.3 - July 29, 2003 Python 3.10 - October 4, 2021
Python 2.4 - November 30, 2004
Python 2.5 - September 19, 2006
Python 2.6 - October 1, 2008
Python 2.7 - July 3, 2010

Course content

  • Core concepts about Python syntax: Data types, blocks and indentation, variable scoping, iteration, functions, methods and arguments
  • Different ways to control program flow using loops and conditional tests
  • Regular expressions and pattern matching
  • Writing functions and best-practice ways of making them usable
  • Reading from and writing to files
  • Code packaging and Python libraries
  • How to work with biological data using external libraries.

Learning outcomes

After this course you should be able to:

  • Describe and apply basic concepts in Python, such as:
    • Variables
    • Operators
    • Loops
    • If/else statements
    • Functions
    • Reading/writing to files
  • Being able to edit and run Python code
  • Write file-processing Python programs that produce output to the terminal and/or external files
  • Create stand-alone python programs to process biological data
  • Know how to develop your skills in Python after the course (including debugging)

Some good advice¶

  • 5 days to learn Python is not much
  • Amount of information will decrease over days
  • Complexity of tasks will increase over days
  • Read the error messages!
  • Save all your code

How to seek help:

  • Google
  • Ask your neighbour
  • Ask an assistant

You will look like this:¶


Drawing

Day 1¶

  • Types and variables
  • Operations
  • Loops
  • if/else statements

Example of a simple Python script¶

In [6]:
# A simple loop that adds 2 to a number
i = 0
while i < 10:
    u = i + 2
    print('u is' + str(u))
    i += 1
u is2
u is3
u is4
u is5
u is6
u is7
u is8
u is9
u is10
u is11

Example of a simple Python script¶

Drawing

Comment¶

All lines starting with # is interpreted by python as a comment and are not executed. Comments are important for documenting code and considered good practise when doing all types of programming

Example of a simple Python script¶

Drawing

Literals¶

All literals have a type:

  • Strings (str)       ‘Hello’ “Hi”
  • Integers (int)     5
  • Floats (float)     3.14
  • Boolean (bool)     True or False

Literals define values¶

In [7]:
'this is a string'
"this is also a string"
3       # here we can put a comment so we know that this is an integer
3.14    # this is a float
True    # this is a boolean

type(True)
Out[7]:
bool

Collections¶

In [8]:
[3, 5, 7, 4, 99]       # this is a list of integers

('a', 'b', 'c', 'd')   # this is a tuple of strings
{'a', 'b', 'c'}        # this is a set of strings
{'a':3, 'b':5, 'c':7}  # this is a dictionary with strings as keys and integers as values

type([3, 5, 7, 4, 99])
Out[8]:
list

What operations can we do with different values?¶

That depends on their type:

In [10]:
'a string'+' another string'
2 + 3.4
'a string ' * 3
#'a string ' * 3.4
Out[10]:
'a string a string a string '

Type         Operations

int           + - / ** % // ...
float           + -
/ * % // ...
string           +

Example of a simple Python script¶

Drawing

Identifiers¶

Identifiers are used to identify a program element in the code.

For example:

  • Variables
  • Functions
  • Modules
  • Classes

Variables¶

Used to store values and to assign them a name.

Examples:

  • i = 0
  • counter = 5
  • snpname = 'rs2315487'
  • snplist = ['rs21354', 'rs214569']
In [11]:
width  = 23564
height = 20

snpname = 'rs56483 '
snplist = ['rs12345','rs458782']

snpname * 3
Out[11]:
'rs56483 rs56483 rs56483 '

How to correctly name a variable¶

Drawing

Allowed:                       Not allowed:
Var_name                       2save
_total                           *important
aReallyLongName                 Special%
with_digit_2                       With   spaces
dkfsjdsklut   (well, allowed, but NOT recommended)

NO special characters:
+ - * $ % ; : , ? ! { } ( ) < > “ ‘ | \ / @

Reserved keywords¶

Drawing

These words can not be used as variable names

Summary¶

  • Comment your code!
  • Literals define values and can have different types (strings, integers, floats, boolean)
  • Values can be collected in lists, tuples, sets, and dictionaries
  • The operation that can be performed on a certain value depends on the type
  • Variables are identified by a name and are used to store a value or collections of values
  • Name your variables using descriptive words without special characters and reserved keywords

→ Notebook Day_1_Exercise_1 (~30 minutes)

NOTE!¶

How to get help?¶

  • Google and Stack overflow are your best friends!
  • Official python documentation
  • Ask your neighbour
  • Ask us

Python standard library¶

Drawing

Example print() and str()¶

Drawing

Note!
Here we format everything to a string before printing it

Python standard library¶

Drawing

In [13]:
width  = 5
height = 3.6
snps   = ['rs123', 'rs5487']
snp    = 'rs2546'
active = True
nums   = [2,4,6,8,4,5,2]

#sum(width)

More on operations¶

Drawing

In [14]:
x = 4
y = 3
z = [2, 3, 6, 3, 9, 23]
pow(x, y)
Out[14]:
64

Comparison operators¶

Drawing

Can be used on int, float, str, and bool. Outputs a boolean.

In [15]:
x = 5
y = 3

y == x
Out[15]:
False

Logical operators¶

Drawing

Membership operators¶

Drawing

In [16]:
x = 2
y = 3

x == 2 or y == 5

x = [2,4,7,3,5,9]
y = ['a','b','c']

2 in x
4 in x and 'd' in y
Out[16]:
False
In [17]:
# A simple loop that adds 2 to a number and checks if the number is even
i    = 0
even = [2,4,6,8,10]
while i < 10:
    u = i + 2
    print('u is '+str(u)+'. Is this number even? '+str(u in even))
    i += 1
u is 2. Is this number even? True
u is 3. Is this number even? False
u is 4. Is this number even? True
u is 5. Is this number even? False
u is 6. Is this number even? True
u is 7. Is this number even? False
u is 8. Is this number even? True
u is 9. Is this number even? False
u is 10. Is this number even? True
u is 11. Is this number even? False
In [18]:
# A simple loop that adds 2 to a number, check if number is even and below 5
i    = 0
even = [2,4,6,8,10]
while i < 10:
    u = i + 2
    print('u is '+str(u)+'. Is this number even and below 5? '+\
          str(u in even and u < 5))
    i += 1
u is 2. Is this number even and below 5? True
u is 3. Is this number even and below 5? False
u is 4. Is this number even and below 5? True
u is 5. Is this number even and below 5? False
u is 6. Is this number even and below 5? False
u is 7. Is this number even and below 5? False
u is 8. Is this number even and below 5? False
u is 9. Is this number even and below 5? False
u is 10. Is this number even and below 5? False
u is 11. Is this number even and below 5? False

Order of precedence¶

There is an order of precedence for all operators:

Drawing

Word of caution when using operators¶

In [19]:
x = 5
y = 7
z = 2
x == 5 and y < 7 or z > 1

# and binds stronger than or
x > 4 or y == 6 and z > 3
x > 4 or (y == 6 and z > 3)
(x > 4 or y == 6) and z > 3
Out[19]:
False
In [20]:
# BEWARE!
x = 5
y = 8

#xx == 6 or xxx == 6 or x > 2
x > 42 or (y < 7 and xx > 1000)
Out[20]:
False

Python does short-circuit evaluation of operators

More on sequences (For example strings and lists)¶

Lists (and strings) are an ORDERED collection of elements where every element can be accessed through an index.

Drawing

In [23]:
l = [2,3,4,5,3,7,5,9]
s = 'some longrandomstring'

'o' in s
l[2]
s[0:7]
s[0:8:2]
s[-1]
l[0] = 42
#s[0] = 'S'

Mutable vs Immutable objects¶


Mutable objects can be altered after creation, while immutable objects can't.

Immutable objects:       Mutable objects:

  • int               • list
  • float                • set
  • bool                • dict
  • str
  • tuple

Operations on mutable sequences¶

Drawing

In [24]:
s = [0,1,2,3,4,5,6,7,8,9]
s.insert(5,10)
#s.reverse()
s.append(10)
s
Out[24]:
[0, 1, 2, 3, 4, 10, 5, 6, 7, 8, 9, 10]

Summary¶

  • The python standard library has many built-in functions regularly used
  • Operators are used to carry out computations on different values
  • Three types of operators; comparison, logical, and membership
  • Order of precedence crucial!
  • Mutable object can be changed after creation while immutable objects cannot be changed



→ Notebook Day_1_Exercise_2 (~30 minutes)

Loops in Python¶

In [25]:
fruits = ['apple','pear','banana','orange', 'grapes']

print(fruits[0])
print(fruits[1])
print(fruits[2])
print(fruits[3])
print(fruits[4])
apple
pear
banana
orange
grapes
In [26]:
fruits = ['apple','pear','banana','orange', 'grapes']

for fruit in fruits:
    print(fruit)
print('hello')
print('done')
apple
pear
banana
orange
grapes
hello
done

Always remember to INDENT your loops!

Different types of loops¶

For loop¶

In [27]:
fruits = ['apple','pear','banana','orange']
mystring = 'mylongstring'

for fruit in fruits:
    print(fruit)
apple
pear
banana
orange

While loop¶

In [28]:
fruits = ['apple','pear','banana','orange']

i = 0
while i < len(fruits):
    print(fruits[i])
    i = i + 1

print(i)
apple
pear
banana
orange
4

Different types of loops¶

For loop

Is a control flow statement that performs a fixed operation over a known amount of steps.

While loop

Is a control flow statement that allows code to be executed repeatedly based on a given Boolean condition.


Which one to use?

For loops better for simple iterations over lists and other iterable objects

While loops are more flexible and can iterate an unspecified number of times

Example of a simple Python script¶


Drawing

→ Notebook Day_1_Exercise_3 (~20 minutes)

Conditional if/else  statements

Drawing

In [29]:
shopping_list = ['bread', 'egg', 'butter', 'milk']

if len(shopping_list) > 3:
    print('Go shopping!')
else:
    print('Nah! I\'ll do it tomorrow!')
Go shopping!
In [30]:
shopping_list = ['bread', 'egg', 'butter', 'milk']
tired         = True

if len(shopping_list) > 3:
    if not tired:
        print('Go shopping!')
    else:
        print('Too tired, I\'ll do it later')
else:
    if not tired:
        print('Better get it over with today anyway')
    else:
        print('Nah! I\'ll do it tomorrow!')
Too tired, I'll do it later

This is an example of a nested conditional¶

Putting everything into a Python script¶

Any longer pieces of code that have been used and will be re-used SHOULD be saved

Two options:

  • Save it as a text file and make it executable
  • Save it as a notebook file

Things to remember when working with scripts¶

  • Put #!/usr/bin/env python in the beginning of the file
  • Make the file executable to run with ./script.py
  • Otherwise run script with python script.py

Working on files¶

In [31]:
fruits = ['apple','pear','banana','orange']

for fruit in fruits:
    print(fruit)
apple
pear
banana
orange

Drawing

In [32]:
fh = open('../files/fruits.txt', 'r', encoding = 'utf-8')

for line in fh:
    print(line)

fh.close()
apple

pear

banana

orange

Aditional useful methods:¶


'string'.strip()       Removes whitespace
'string'.split()       Splits on whitespace into list

In [33]:
s  = '  an example string to split with whitespace in end   '
sw = s.strip()
sw
l  = s.split()
#l  = s.strip().split()
l
Out[33]:
['an', 'example', 'string', 'to', 'split', 'with', 'whitespace', 'in', 'end']

Drawing

In [34]:
xx = open('../files/fruits.txt', 'r', encoding = 'utf-8')

for line in xx:
    print(line.strip())

fh.close()
apple
pear
banana
orange

Another example¶

Drawing How much money is spent on ICA?

In [35]:
fh    = open("../files/bank_statement.txt", "r", encoding = "utf-8")

total = 0
times = 0

for line in fh:
    expenses = line.strip().split()  # split line into list
    store    = expenses[0]           # save what store
    price    = float(expenses[1])    # save the price
    if store == 'ICA':               # only count the price if store is ICA
        times = times + 1
        total = total + price
fh.close()

print('Total amount spent on ICA is: '+str(total))
print(times)
Total amount spent on ICA is: 1186.71
3

Slightly more complex...¶

Drawing

How much money is spent on ICA in September?

In [36]:
fh    = open("../files/bank_statement_extended.txt", "r", encoding = "utf-8")

total = 0

for line in fh:
    if not line.startswith('store'):
        expenses = line.strip().split()
        store    = expenses[0]
        year     = expenses[1]
        month    = expenses[2]
        day      = expenses[3]
        price    = float(expenses[4])
        if store == 'ICA' and month == '09':   # store has to be ICA and month september
            total = total + price
fh.close()

out = open("../files/bank_statement_results.txt", "w", encoding = "utf-8")   # open a file for writing the results to
out.write('Total amount spent on ICA in september is: '+str(total))
out.close()
In [37]:
for file in os.scandir("../files/"):
    print(time.ctime(os.stat(file).st_mtime), '\t', file.name)
Thu May 20 17:46:00 2021 	 bank_statement.txt
Thu May 20 17:46:00 2021 	 bank_statement_extended.txt
Thu Sep  8 16:31:04 2022 	 bank_statement_results.txt
Thu May 20 17:46:00 2021 	 blocket_listings_selected.txt
Thu May 20 17:46:01 2021 	 cheat_sheet.pdf
Thu May 20 17:46:01 2021 	 fruits.txt
Thu May 20 17:46:01 2021 	 fruits_extended.txt
Thu Oct 21 12:53:44 2021 	 schedule.csv
Thu May 20 17:46:01 2021 	 somerandomfile.txt

Drawing

Summary¶

  • Python has two types of loops, For loops and While loops
  • Loops can be used on any iterable types and objects
  • If/Else statement are used when deciding actions depending on a condition that evaluates to a boolean
  • Several If/Else statements can be nested
  • Save code as notebook or text file to be run using python
  • The function open() can be used to read in text files
  • A text file is iterable, meaning it is possible to loop over the lines

→ Notebook Day_1_Exercise_4